EXHIBIT C 



Biochem. J. (1986) 240, 1-12 (Printed in Great Britain) 



1 



REVIEW ARTICLE 

The purification of eukaryotic polypeptides synthesized in 
Escherichia coli 

Fiona A. O. MARSTON 

Protein Biochemistry Department, Celltech Ltd., 244-250 Bath Road, Slough, Berks. SL1 4DY, U.K. 



INTRODUCTION 

Over the last 13 years, manipulation of DNA in vitro 
has developed from the transfer of genetic information 
between prokaryotic organisms (Cohen et al., 1973) to a 
technology which facilitates efficient and controlled 
production of proteins in foreign hosts. A significant 
feature of these developments is the ability to express 
eukaryotic genes in prokaryotes such as Escherichia coli 
(Harris, 1983; Wetzel & Goeddel, 1983). The supply of 
many eukaryotic polypeptides which have potential 
clinical or industrial use is often limited by their low 
natural availability. Gene cloning and expression in 
E. coli can provide a more abundant source of these 
polypeptides. 

The mode of gene expression affects the location of the 
proteins produced. The proteins may either be located in 
the cytoplasm of E. coli or secreted through the cell 
membrane. Eukaryotic genes cloned in frame with 
synthetic or bacterial nucleic acid sequences can be 
expressed as hybrid products in the cell cytoplasm. 
Transcription, from bacterial promoters, and translation, 
yield fusion proteins which include bacterial or synthetic 
polypeptide sequences in addition to the eukaryotic 
polypeptide. An alternative approach which locates 
proteins in the cytoplasm is direct expression, where 
bacterial promoters and terminators are used in the 
transcription of the foreign gene alone. In E. coli an 
ATG, or occasionally a GTG, sequence must precede 
the gene coding sequence, for translation initiation. Thus 
the primary products of translation possess an iV-terminal 
methionine residue. E. coli possesses enzymes which 
catalyse the efficient removal of the methionine residues 
from natural proteins when required, but these enzymes 
do not work with the same efficiency on recombinant 
polypeptides and therefore directly expressed proteins 
may possess an unnatural JV-terminal methionine residue. 
Finally, gene sequences which include a leader or signal 
sequence cloned in frame with the eukaryotic genes, 
when transcribed and translated can direct secretion of 
the eukaryotic polypeptides through the bacterial cell 
membrane. 

From the increasing number of reports of eukaryotic 
polypeptide synthesis in E. coli it is clear that the mode 
of expression affects not only the efficiency of production, 
but the nature of the polypeptide product itself. In 
general, recombinant polypeptides accumulate to higher 
levels of total cell protein when expressed intracellularly 
than when secreted, but many of the polypeptide 



products located in the cytoplasm are insoluble and 
aggregated. The consequent isolation and purification 
techniques required are the subject of this Review. 



INTRACELLULAR EXPRESSION 

Genes expressed directly or as fusion proteins in the 
cytoplasm of E. coli characteristically accumulate to 
levels ranging up to 25% of total cell protein (Table 1). 
However, in the majority of cases, the expressed proteins 
are in an insoluble form (Harris, 1983). This was not 
expected, as the authentic proteins are naturally 
produced in soluble forms. The appearance of inclusion 
bodies in E. coli in parallel with the accumulation of 
proinsulin, insulin A chain or insulin B chain (Williams 
et al., 1982) was the first indication that the insoluble 
proteins might accumulate in a discrete form. By isolating 
inclusion bodies from cells expressing prochymosin it was 
demonstrated that these inclusions were indeed predom- 
inantly composed of recombinant protein (Marston et al, 
1984). A number of eukaryotic polypeptides expressed in 
E. coli directly, e.g. bovine growth hormone, salmon 
growth hormone, IFN-/?, IFN-y, IL-2 and Protein C, or 
as fusion proteins, e.g. proinsulin, myoglobin and 
/?-globin, have now been shown to exist as aggregates or 
inclusion bodies (see Table 1 for references). 

In the phase contrast microscope, inclusion bodies are 
seen to be highly refractile, while transmission electron 
microscopy reveals them as amorphous aggregates not 
enclosed or in contact with a distinct membrane 
(Schoemaker et al., 1985; Schoner et al., 1985). Electron 
micrographs show isolated inclusions as spherical in 
shape (Fig. la), although this may not be the 
conformation in vivo, because of the preparation 
procedures used before microscopy. Contaminating 
material can be seen, associated with the isolated 
inclusion bodies (Figs, la and lb). 

There is no direct evidence to indicate why eukaryotic 
polypeptides are sequestered into inclusion bodies in 
E. coli. The accumulation of abnormal E. coli proteins in 
intracellular granules was demonstrated some years ago 
(Prouty & Goldberg, 1972; Prouty et al., 1975). 
However, normal E. coli proteins synthesized to high 
levels using recombinant DNA techniques can also 
accumulate in insoluble forms (Gribskov & Burgess, 
1983; Botterman & Zabeau, 1985) and as inclusion 
bodies (Cheng, 1983). It is therefore not simply a 
response to 'foreign' proteins. One interpretation is that 



Abbreviations used: AIDS, acquired immune deficiency syndrome; BPV, bovine papilloma virus; CAT, chloramphenicol acetyltransferase; EGF, 
epidermal growth factor; FMDV, foot and mouth disease virus; HBV, hepatitis B virus; HSV, herpes simplex virus; IFN, interferon; IGF, insulin-like 
growth factor; IL, interleukin; TNF, tumour necrosis factor. 
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Table 1. Properties of some eukaryotic polypeptides located in the cytoplasm of E coli 



Polypeptide 



M T (x 10- 3 ) 
of authentic 
non-glycosylated Mode of 
polypeptide expression 



Expression Location on 
(%) cell lysisf 



No. of 
cysteine 
residues 



References 



Somatostatin 


1.5 


Fusion 


<0.05 


Pellet 


2 


Insulin A chain 


2.0 


Fusion 


20 


Pellet 


4 


Insulin B chain 


3.0 


Fusion 


20 


Pellet 


2 


Calcitonin 


3.5 


Fusion 


17 


Pellet 


2 


/^-Endorphin 


3.9 


Fusion 


5 


Pellet 


0 


Urogastrone 


7.5 


Fusion 


NEJ 


Pellet 


6 


T4 Reg A 


14.6 


Fusion 


NE 


Pellet 


1 


yff-Globin 


16 


Fusion 


5-10 


Pellet 


2 


Myoglobin 


17 


Fusion 


10 


Pellet 


0 


Bovine growth hormone 


22 


Fusion 


5 


Pellet 


2 


Human growth hormone 


22 


Fusion 


5 


Pellet 


2 


*a,-Antitrypsin 


45 


Fusion 


15 


(Supernatant) 


1 



♦Complement C5a 

*Interleukin-2 
♦AIDS peptide 121 
Human TNF 
Murine TNF 
♦Interferon y 
♦Human lymphotoxin 
Interferon a 

♦Interferon ft 
Growth hormone (bovine) 
Growth hormone (human) 
Growth hormone (salmon) 
Ac-Light chain (IgG) 
AIDS p24 gag 
Apolipoprotein E 
Calf prochymosin 
♦y-Heavy chain (IgG) 
* Protein C 



Triosephosphate isomerase 
Urokinase 
MMLV reverse 
transcriptase 

♦ Polypeptides that are naturally glycosylated in vivo. 

t Parentheses indicate that only a proportion of the activity is located in the supernatant. 
% NE, not estimated. 



8.3 


Direct 


0.007 


(Supernatant) 


7 


12-17 


Direct 


10 


(Supernatant) 


3 


15 


Direct 


5-10 


Pellet 


2 


17 


Direct 


15 


Supernatant 


2 


17 


Direct 


24 


Supernatant 


2 


18 


Direct 


25 


(Supernatant) 


2 


18 


Direct 


NE 


Supernatant 


0 


19.4 


Direct 


NE 


Supernatant 


5 


22-26 


Direct 


15 


Pellet 


3 


22 


Direct 


NE 


Pellet 


2 


22 


Direct 


NE 


Supernatant 


2 


22 


Direct 


15 


Pellet 


2 


24 


Direct 


0.5 


Pellet 


5 


24 


Direct 


NE 


Supernatant 


0 


34.2 


Direct 


1 


Pellet 


1 


43 


Direct 


8 


Pellet 


6 


49 


Direct 


3 


Pellet 


13 


50 


Direct 


(25% of 


Pellet 


24 






insoluble 






53 




protein) 






Direct 


0.3 


Supernatant 


4 


54 


Direct 


NE 


Pellet 


24 


80 


Direct 


20 


(Supernatant) 


8 



Itakura et al. (1977) 
Goeddel et al. (19796) 
Goeddel et al. (19796) 
Bennett et al. (1984) 
Shine et al. (1980) 
Sassenfeld & Brewer (1984) 
Adari et aL (1985) 
Nagai et aL (1985) 
Varadarajan et al. (1985) 
Seeburg et aL (1978) 
Szoka et al. (1986) 
Courtney et al. (1984) 



Mandecki et aL (1985) 
Devos et aL (1983) 
Chang et al. (1985) 
Pennica et al. (1984) 
Pennica et al. (1985) 
Simons et al. (1984) 
Gray et aL (1984) 
Staehelin et al. (1981) ; 
Wetzel et al. (1981) 
Whitehorn et aL (1985) 
George et aL (1985) 
Goeddel et al. (1979a) 
Sekine et aL (1985) 
Cabilly et aL (1984) 
Dowbenko et al. (1985) 
Vogel et aL (1985) 
Marston et al. (1984) 
Cabilly et al. (1984) 
Hoskins et al. (1985) 



Straus & Gilbert (1985) 
Winkler et al. (1985) 
Kotewicz et aL (1985) 



proteins aggregate when they are synthesized at such a 
high rate that the cells' degradation systems become 
saturated (Prouty et al., 1975). This cannot explain why 
prochymosin expressed at low levels is also insoluble 
(Schoemaker et aL, 1985). What is evident, however, is 
that stringent chemical conditions, as described below, 
are required to solubilize recombinant proteins from 
inclusion bodies. Formation is therefore not just a 
precipitation phenomenon resulting from the accumula- 
tion of a high concentration of protein. Precipitation may 
be an initial effect, but at some stage ionic, hydrophobic 
or covalent interactions form between the protein 
molecules. 

When recombinant polypeptides are insoluble, purifi- 
cation does not just involve the application of chromato- 
graphic separation techniques, the objective being also 
to recover active, soluble protein. Fig. 2 illustrates 



diagrammatically procedures that can be used to 
solubilize aggregated recombinant polypeptides. For 
directly expressed proteins, (Fig. 2a), the first stage is to 
isolate the inclusion bodies. Then denaturants are used 
to unfold the polypeptides and finally conditions are 
adjusted to allow the polypeptides to refold correctly. 
For fusion proteins it may in addition be necessary to 
cleave the fusion to isolate the recombinant polypeptide 
(Fig. 2b, stage 3). A varying degree of purification can be 
achieved during these procedures, and once the poly- 
peptides are solubilized conventional chromatographic 
techniques can be applied. 

The fact that recombinant polypeptides aggregate in a 
discrete form is useful for the purposes of purification. As 
first noted for abnormal E. coli proteins (Prouty et aL, 
1975), inclusion bodies are dense and sediment readily 
with low speed centrifugation. Speeds as low as 500 g are 
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Fig. 1. Electron micrographs of prochymosin inclusion bodies 
isolated from E. coli 

Sample preparation: inclusion bodies were isolated from 
E. coli HB101 pCT 70 as described in Marston et al 
(1984). The inclusion bodies were fixed first in 2.5% 
glutaraldehyde then in 1% Os0 4 . After dehydration 
through a series of alcohols, the inclusion bodies were 
embedded in Spurr's resin. Sections 60 nm thick were cut 
using an LKB Ultratome III. Sections were stained using 
uranyl acetate (30 min, 60 °C) and lead citrate (lOmin, 
room temperature), (a) Transmission electron micrograph 
of water-washed isolated inclusion bodies in suspension. 
Magnification x 63000. (b) Transmission electron micro- 
graph of a thin section through a pellet of inclusion bodies. 
Magnification x 63000. (Provided by and reproduced with 
the permission of Richard Sugrue, Ray Newsam and the 
University of Kent Electron Microscopy Unit, Canterbury, 
U.K.). 



reported to sediment recombinant inclusion bodies 
(Olsen, 1985) but values of 5000-12000 g are more 
generally used (Marston et al, 1984; George et al, 1985; 
Schoner et al, 1985). Under these conditions, the 
inclusion bodies sediment more rapidly than the bulk of 
the cell debris and purification is achieved. However, 
contaminating proteins do co-purify with the inclusions, 
as illustrated for prochymosin (Fig. 3). Detergent 
(Marston et al, 1984) or urea (Koths et al, 1985; 
Schoner et al, 1985) can be used to solubilize the 
microbial proteins preferentially, leaving the recombin- 
ant proteins between 30% and 90% pure. The usefulness 
of this procedure is underlined by the fact that 
techniques have been developed to enhance aggregation. 
When a proportion of the human growth hormone 
expressed in E. coli was found to partition in the soluble 
fraction, methods normally used to kill cells, such as heat 
treatment (60-80 °C), acid treatment, or phenol plus 
toluene, were used to increase recovery of the recombin- 
ant protein in an aggregated form (Olsen, 1985). 



Fusion proteins 

The intracellular accumulation of a eukaryotic poly- 
peptide expressed directly in E. coli may be limited 
because the protein is recognized as foreign and 
degraded. This is particularly apparent with small 
polypeptides (Wetzel & Goeddel, 1983). Expression 
levels can be improved by linking the eukaryotic gene 
with a bacterial gene and producing a fused protein 
product, as demonstrated for somatostatin (Itakura 
et al, 1977), insulin (Goeddel et al, 19796) and 
/^-endorphin (Shine et al., 1980). Fusion proteins can be 
expressed to levels of up to 26% of total cell protein 
(Table 1), but if the bacterial gene constitutes a large 
proportion of the fusion, then the amount of eukaryotic 
product will be small. There are many examples of fusion 
proteins which accumulate in an insoluble form (Table 1) 
despite the fact that the eukaryotic sequence is a small 
proportion of the total protein sequence; typical 
examples being ^-endorphin (31 amino acids) fused to 
yff-galactosidase (Shine et al, 1 980) and calcitonin-Gly (32 
amino acids) fused to CAT (Bennett et al., 1984). 

Another fusion method is one in which multiple copies 
of the gene are linked in tandem (Shen, 1984). 
Expression levels were increased when two or more 
linked proinsulin genes were expressed directly or in 
conjunction with a small part of the TV-terminus of 
y?-galactosidase. The stability of the two products was 
similar, but the yield of the /?-galactosidase fusion was 
3-fold greater than that of the directly expressed 
product. Effects at the level of transcription, translation 
or on mRNA stability were suggested. Enhanced 
expression of proinsulin in E. coli by the TV-terminal 
addition of short homo-oligopeptides was described in a 
recent report (Sung et al., 1986). Seven of the 20 
oligomers studied were particularly effective, but the 
mechanism by which accumulation was affected was not 
established. 

Fusion proteins may be purified and used without 
further modification. The isolation of inclusion bodies 
can be utilized as a purification step for insoluble 
proteins (Kleid et al, 1981; Pilacinski et al, 1984; 
Cabradilla et al, 1986). Further purification methods 
used include preparative electrophoresis (Kleid et al., 
1981), immunopurification (Liu et al, 1984), gel 
filtration and ion-exchange chromatography (Cabridilla 
et al, 1986). In order to maintain the proteins in a 
soluble form for column chromatography, detergents or 
denaturants may be included in the buffers (Liu et al, 
1984; Cabradilla et al, 1986). Purified intact fusion 
proteins have been used in the development of vaccines 
for FMDV (VP1), BPV and cholera toxin (Kleid et al, 
1981 ; Pilancinski et al, 1984; Jacob et al, 1985), in the 
development of diagnostic kits for the AIDS retrovirus 
(Cabradilla et al, 1986) and to demonstrate biological 
activity of the F c portion of IgE (Liu et al, 1984). 

When the eukaryotic polypeptide is required in 
isolation from the fusion protein the strategy used is to 
place a cleavage site between the C-terminus of the 
prokaryotic sequence and the TV-terminus of the 
eukaryotic coding sequence; Table 2 lists some typical 
examples. A schematic illustration of a typical isolation 
and cleavage protocol for fusion proteins is given in Fig. 
2. Inclusion bodies are isolated and denatured before 
cleavage. Chemical cleavage can be effected using CNBr, 
which cleaves on the C-terminal side of the methionine 
residues. This method was used in the production of 
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Fig. 2. Diagram of stages in the recovery of active proteins from the cytoplasm of E. coti 

(a) Direct expression. Stage 1, cell lysis and isolation of inclusion bodies. Stage 2, denaturation. Stage 3, refolding. The example 
illustrated is a disulphide-containing protein. R represents either -H or 7SO3". (b) Fusion proteins. Stages 1 and 2 as for 
(a). Stage 3, cleavage ( v ) to release the recombinant polypeptide. Stage 4* refolding; ■■ represents the eukaryotic portion 
of the fusion protein. 



/?-galactosidase-insulin A chain and /?-galactosidase- 
insulin B chain fusions (Goeddel et al., 19796). Both 
products were insoluble and centrifugation was therefore 
used as a purification step. Denaturation using guanidi- 
nium chloride in the presence of yff-mercaptoethanol was 
necessary before CNBr cleavage. The individual chains 
were purified further by using ion-exchange chromato- 
graphy, reverse phase h.p.l.c. and gel filtration. 5- 
Sulphonated derivatives of the chains mixed and 
re-oxidized yielded biologically active insulin. The 



tandem linked pro-insulin genes expressed by Shen 
(1984) were each separated by the sequence -Arg-Arg- 
Asn-Ser-Met-. The intervening sequences were cleaved 
after the methionine residues by CNBr treatment. This 
method is limited in its use, since most proteins are likely 
to contain methionine residues. Also, it is necessary for 
the polypeptides to be acid-stable as cleavage is 
performed in 70% formic acid. 

A unique cleavage site, not present in the coding 
sequence of the recombinant gene, is the ideal; 
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Fig. 3. Coomassie Blue stained SDS/polyacrylamide-gel profile 
of total E. coli proteins and isolated inclusions 

Lane a, total cell proteins from E. coli HB101/pCT54 
(control); lane b, total cell proteins from E. coli 
HB101/pCT70 (prochymosin producer). Lanes c-/were 
all from HB101/pCT70. Lane c, lysis pellet; lane d, lysis 
supernatant; lane e, Triton X-100/EDTA-washed pellet; 
lane /, Triton X-100/EDTA-wash supernatant. M T 
markers are indicated on the left. pC, prochymosin. 



otherwise, the recombinant product will also be cleaved. 
Use of a lysine link in a TrpE-EGF fusion was possible 
because EGF contains no lysine residues (Allen et ai, 
1985). Isolated inclusion bodies were solubilized in 
8 M-urea, and after dilution, the EGF was released from 
the fusion protein by digestion with endoproteinase Lys 
C. Gel filtration and ion-exchange chromatography 
yielded EGF 80-90% pure. Batch variation in the 
activity of endoproteinase Lys C rendered this cleavage 
procedure irreproducible. TrpE-EGF fusion proteins 
containing acid-labile (-Asp-Pro-), factor Xa- and 
collagenase-sensitive linking regions were also described, 
but little if any EGF was released from any of these 
fusions using the appropriate cleavage conditions. If the 
cleavage site is not unique, then the internal lysis sites can 
be protected. This was possible with a /?-galactosidase- 
/?-endorphin fusion (Shine et ai, 1980) where internal 
lysine residues were reversibly blocked by citraconylation 
and trypsin was used to cleave after the-Lys-Arg- 
residues immediately before the AT-terminus. After 
deblocking of the lysines, a biologically active product 
was recovered. However, this blocking and deblocking 
approach is not generally satisfactory. 

An insoluble TrpE-bovine growth hormone fusion 
was constructed with an acid-labile Asp-Pro cleavage site 
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Table 2. Cleavage sites engineered into fusion proteins in E. coli 



indicates the peptide bond cleaved. 



Sequence 


Cleavage 




recognized 


effector 


Keierence 


-Asp-Pro- 


Acid pH 


Szoka et ai 






(1986) 


-Met*Xaa- 


CNBr 


Goeddel et ai 






(19796) 


-Arg-Xaa- 




Shine et ai 


or 


Trypsin 


(1980) 


-Lys ? Xaa- 




-Ile-Glu-Gly-Arg*Xaa 


Factor X a 


Nagai et ai 






(1985) 


-Pro-Xaa*Gly-Pro-Yaa* 


Collagenase 


Lee & Ullrich 






(1984) 


-Arg*Xaa 


Clostripain 


Bennett et al. 






(1984) 



(Szoka et ai, 1986). Low pH treatment was performed in 
the presence of guanidinium chloride. The bovine growth 
hormone released bound to growth hormone receptors 
in vitro. However, this cleavage regime leaves proteins 
with an uncharacteristic iV-terminal proline residue, and 
the acid conditions used may cause amide loss from 
asparagine residues. 

The samples described above illustrate the fact that 
insoluble fusion proteins may have to be cleaved in the 
presence of denaturant. This is a consideration when an 
enzymically cleavable site is engineered. Clostripain 
cleavage of CAT-calcitonin is possible, as the enzyme 
is stable in up to 6 M-urea (Bennett et ai, 1984). 
Carboxypeptidase B, used to cleave urogastrone- 
polyarginine, is stable in up to 5 M-urea (Sassonfeld & 
Brewer, 1984). 

The longer the recognized cleavage site, the less likely 
it is to be found in the coding sequence of recombinant 
protein. A specific example is the cleavage sequence 
recognized by collagenase (Table 2) which has been 
introduced into TrpE-IGF and TrpE-EGF fusion 
proteins (Lee & Ullrich, 1984). A tetrapeptide sequence, 
-Ile-Glu-Gly-Arg-, precedes the cleavage sites of 
factor Xa. Both human y?-globin (Nagai et ai, 1985) and 
human myoglobin (Varadarajan et ai, 1985) have been 
fused via this sequence to a AcII protein fragment. The 
resulting fusion proteins were insoluble and inclusion 
bodies were isolated. Triton-washed inclusions were 
solubilized in urea. /?-Globin was purified further by 
ion-exchange chromatography and gel filtration. De- 
naturant was removed before cleavage with factor Xa. 
Refolded /?-globin was reconstituted with haem and 
a-globin. The oxygen-binding properties of the recom- 
binant haemoglobin were essentially the same as those of 
authentic human haemoglobin (Nagai et al., 1985). 

In the protocol for myoglobin, inclusion bodies were 
isolated and washed. The intact fusion proteins were 
reconstituted with haem in the presence of urea. This 
allowed cleavage by trypsin in place of factor Xa, 
because the holoenzyme is resistant to proteolytic attack. 
Dialysis against Tris buffer, pH 8.0, was used to remove 
the trypsin. Ion-exchange chromatography and gel 
filtration were used to purify the protein further. Using 
the trypsin cleavage process, myoglobin has been 
produced on the gram scale (Varadarajan et al., 1985). 
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If a proenzyme such as prochymosin is expressed as a 
fusion, insertion of a specific cleavage site is not 
required. At acid pH, prochymosin is autocatalytically 
processed to the chymosin and the fusion protein is 
removed with the pro-peptide (Nishimori et al, 1984). 

There are intracellular fusion proteins which are 
soluble. Examples include yff-galactosidase fusions with 
HBV pre-S2 region (Offensperger et al, 1985) and 
cholera toxin CTP3 (Jacob et al, 1985). AcII protein 
sequences fused to a r antitrypsin (Courtney et al, 1984) 
and AN protein sequences fused to HSV-thymidine 
kinase (Waldman et al., 1983). The latter is an interesting 
example of a fusion protein in which there was no 
specific cleavage site. However, the fusion was correctly 
processed by an E. coli proteinase during cell lysis, 
liberating active thymidine kinase. o^-Antitrypsin was 
not completely soluble, and approx. 60% of the fusion 
protein was insoluble and inactive (Courtney et al., 
1984). 

Fusion proteins can be constructed to facilitate 
purification. A bacterial sequence can be selected that 
codes for a polypeptide which can be isolated by affinity 
chromatography. The /?-galactosidase-HBV pre-S2 
fusion synthesized in E. coli was purified using 
/>-aminophenyl-/ff-D-thiogalactoside--Sepharose. In a 
single step, the protein was purified more than 30-fold to 
>90% homogeneity (Offensperger etaL, 1985). Substrate 
affinity chromatography can also be used to purify 
polypeptides fused to CAT (Bennett et al., 1984), an 
appropriate substrate being acetyl-CoA. The latter 
approach is limited to soluble fusion products. Typically, 
cleavage sites are needed to allow the production of free 
polypeptide. 

A different approach has been taken in designing the 
polyarginine purification fusion (Sassenfeld & Brewer, 

1984) . A synthetic gene sequence coding for polyarginine 
was fused to the 3' end of the urogastrone gene. 
Cation-exchange chromatography was used to purify 
the positively charged recombinant fusion protein. This 
is an effective purification step, as most bacterial proteins 
are acidic, and are negatively charged at the pH of 5.5 
which was used. The polyarginine sequence can be 
cleaved using carboxypeptidase B to yield free urogas- 
trone. At this stage, a second cation-exchange step retains 
the contaminating proteins while the urogastrone flows 
straight through. The urogastrone-polyarginine was 
insoluble, so the purification and proteolytic cleavage 
with carboxypeptidase B was performed in the presence 
of urea. It was suggested that fusion of this peptide to 
other recombinant polypeptides would allow the same 
purification methodology to apply (Brewer & Sassenfeld, 

1985) . However, the recombinant polypeptide to which 
the polyarginine is fused may interact with, and restrict 
the availability of the polyarginine to bind to, a matrix. 
Another important factor is the efficiency with which the 
polyarginine can be removed to leave the correct 
C-terminus. 

Direct expression 

The major purification problem for directly expressed 
products is the development of techniques to release 
them from aggregates into stable active and soluble 
forms. The first step is to solubilize the proteins and 
conditions are used which, in vitro, will denature native 
proteins. The solubilization agents used include: (a) 5- 
8 M-guanidinium chloride (insulin A and B chains, 



bovine growth hormone and urokinase), (b) 6-8 M-urea 
(IgG heavy and light chains, prochymosin, IFN-y, and 
salmon growth hormone), (c) detergents (IFN-/? and 
IL-2), (d) alkaline pH (> 9.0) (prochymosin and chicken 
growth hormone), and (e) organic solvents (bacterio- 
phage T4 regA protein) (see Table 1 for references). 

Solubilization is therefore achieved by disrupting 
non-covalent hydrogen bonds, ionic or hydrophobic 
interactions and unfolding the polypeptides. The effect- 
iveness of a particular solvent is likely to differ between 
proteins ; being dependent on the nature of the 
polypeptides themselves. There are a number of 
variables in the protocols published, including pH, 
temperature, time and ionic environment. Another 
consideration is the ratio of denaturant to protein 
(Marston et al., 1984), which can affect the efficiency of 
the overall process. 

Solubilized proteins can be purified in the presence of 
denaturant. Proteins can be subjected to ion-exchange or 
gel filtration, if urea, alkaline pH or non-ionic detergents 
have been used (Gill et al., 1985; George et al., 1985). 
Since guanidinium chloride is charged, gel filtration but 
not ion-exchange chromatography may be used (Builder 
& Ogez, 1985 ; Gill et al, 1985). Purification has also been 
achieved by using high-speed centrifugation to remove 
contaminating microbial proteins (Builder & Ogez, 1985) 
and by preferential organic extraction of recombinant 
proteins into butan-2-ol or 2-methylbutan-l-ol (Konrad 
& Lin, 1984; Koths et al., 1985). 

Having disrupted the aggregated polypeptides, condi- 
tions must be adjusted to allow refolding. The recovery 
of high yields of activity depends on the polypeptides 
refolding with the formation of the correct intramolecular 
interactions, including disulphide bonds. Dialysis has 
been used, successfully, to remove denaturant and 
generate soluble, active bovine growth hormone and 
urokinase (George et al, 1985; Winkler et al, 1985), 
although for urokinase it was necessary to maintain the 
protein concentration at or below a critical level 
(Winkler et al, 1985). In contrast, dialysis of IFN-y 
generated both active monomelic and inactive aggregated 
protein (Arakawa et al, 1985). The low yields of active 
prochymosin obtained using dialysis (McCaman et al, 
1985) were much improved by using dilution to reduce 
the urea concentration (Marston et al, 1984). These data 
are consistent with the results of studies of authentic 
protein folding in vitro which show that protein 
concentration is an important parameter in optimizing 
the yield of correctly folded protein (London et al, 1974; 
Mozhaev & Martinek, 1982). The dilution must be such 
that intramolecular interactions occur in preference to 
intermolecular interactions. 

The pH used during refolding can also affect the yield 
obtained. Exposure to alkaline pH ( > 9.0) without the 
use of urea or guanidinium chloride has been used to 
unfold proteins, renaturation being effected by reduction 
of pH (Lowe et al, 1984). However, dilution from urea 
into buffers at alkaline pH was used to produce active 
salmon growth hormone (Sekine et al, 1985) and 
prochymosin (Marston et al, 1984). In many of the 
examples quoted by Olsen (1985), alkaline pH was used 
at some stage of the refolding processes. pH may be 
critical for the formation of correct disulphide bond 
formation, since thiol-disulphide interchange proceeds 
more rapidly at alkaline pH (Freedman & Hillson, 1980). 

In a series of three patents, an unfolding and refolding 
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process was described based on the use of 'strong' 
denaturants followed by the use of a 'weak' denaturant, 
urea (Builder & Ogez, 1985; Olsen, 1985; Olsen & Pai, 
1985). Guanidinium chloride was classed as a 'strong' 
denaturant. Unusually, Triton, SDS and chaotropic salts 
such as thiocyanate were classified in this category. It was 
suggested that, after unfolding, transfer into low 
concentrations of urea permits refolding of the proteins 
into forms approximating their native state. This process 
was used to refold FMDV capsid protein, porcine 
growth hormone, prochymosin (prorennin) and 
urokinase. 

Only the existence of non-covalent interactions in 
inclusion bodies has been discussed so far. In certain 
protocols, used successfully to solubilize recombinant 
polypeptides, thiol reagents were used in conjunction 
with the denaturants (Cabilly et al, 1984; George et al, 
1985; Koths et ai, 1985; Vogel et al, 1985; Chang et al, 
1985; Winkler et al, 1985). If the inclusion of thiol 
reagents is essential for the recovery of activity, covalent 
disulphide bonds may exist in the aggregates Many of the 
eukaryotic polypeptides insoluble when expressed in the 
E. coli cytoplasm are normally secreted by their natural 
cells and contain disulphide bonds which form between 
cysteine residues during the process of secretion. In 
common with a number of bacteria E. coli maintains its 
cytoplasm in a reduced state (Fahey et al., 1977). In 
limited studies, bacterial cytoplasmic proteins were found 
to have a low cysteine content and to contain few 
disulphide bonds (Pollock & Richmond, 1962; Fahey et 
al., 1977). Taking into consideration the high level at 
which eukaryotic polypeptides are synthesized, and the 
potential number of disulphide bonds, it would seem 
unlikely that these bonds form in vivo in the cytoplasm 
of E. coli. It is probable that they form on exposure to 
air, during cell lysis, with many of the disulphide bond 
arrangements being incorrect. However, when prochym- 
osin was expressed directly in E. coli, a small proportion 
of oxidized monomer was shown to exist in intact cells. 
It was suggested that these disulphide bonds may have 
formed in an oxidizing microenvironment within the 
inclusion bodies (Schoemaker et ai, 1985). 

If intramolecular or intermodular disulphide bonds 
in inclusion bodies are incorrect, then disruption will be 
an essential component in the recovery of activity. Thiol 
reagents have been included in denaturation buffers and 
maintained in all the buffers used up to the refolding 
stage (Olsen, 1985). Alternatively, reversible S-sulphona- 
tion (Katsoyannis et ai, 1966) has been used (Cabilly 
et al., 1984; Olsen, 1985). Further formation of 
disulphide bonds was prevented until the redox conditions 
were altered to allow thiol-disulphide interchange 
during the refolding step. 

It should be noted that reduction may not be essential 
for solubilization or refolding, as exemplified by 
prochymosin (Marston et al, 1984) and growth hormone 
(George et al, 1985; Gill et al, 1985). There is no 
evidence that incorrect disulphide bonds cause aggrega- 
tion. There is limited evidence that some disulphide 
bonds form in vivo (Schoemaker et al., 1985) but they are 
clearly a feature of the isolated inclusions bodies. 

The disruption of disulphide bonds may be a 
consideration in the denaturation stage, and conversely 
the formation of correct disulphide bonds may be critical 
during refolding. Growth hormones contain two disul- 
phide bonds per molecule. It is interesting to note that 



while thiol reagents may be included during the 
unfolding step for bovine growth hormone (George 
et al., 1985) none are used in the refolding step for bovine 
and chicken growth hormones (George et al., 1985; Gill 
et al., 1985). Less than 2% of the refolded growth 
hormone preparations were in an aggregated form. Thiol 
reagents were omitted entirely in unfolding and refolding 
salmon growth hormone (Sekine et al., 1985). Thus these 
growth hormones appear to form native disulphide bonds 
in the absence of exogenous thiol reagents. A similar 
observation was made for prochymosin (Marston et al, 
1984) and it was suggested that thiol-disulphide 
interchange was promoted by free thiol groups in the 
recombinant polypeptide. 

When exogenous addition of thiols is required to 
promote correct disulphide bond formation by thiol- 
disulphide interchange, a mixture of reduced and 
oxidized reagents, such as glutathione, may be included 
in the refolding buffer. This may follow denaturation in 
the absence (Winkler et al., 1985) or presence of thiol 
reagent (Olsen, 1985) or after S-sulphonation (Cabilly 
et ai, 1984; Olsen, 1985). For IL-2, oxidation was 
promoted during refolding using iodosobenzoic acid 
(Koths et al, 1985). 

The recovery of biological activity has been demon- 
strated for many of the refolded polypeptides described 
in this section. Examples include immunoglobulins (Boss 
et al, 1984; Cabilly et al, 1984), IL-2 (Liang et al, 1985; 
Malkovsky et al, 1985), bovine, chicken and salmon 
growth hormones (George et al, 1985; Gill et al, 1985; 
Sekine et al, 1985), prochymosin (Green et al, 1985) and 
urokinase (Winkler et al, 1985). However, with few 
exceptions (Boss et al, 1984; Cabilly et al, 1984; 
Marston et al, 1984), quantification of the efficiency with 
which the proteins are refolded is not provided. 

As described earlier, insoluble recombinant polypep- 
tides can be purified to different extents before or during 
refolding. Using either gel filtration or ion-exchange 
chromatography during refolding, growth hormones 

> 90% pure were obtained (Gill et al, 1985). Dialysis 
was used as a final step in these processes, and 
precipitation was observed. If contaminating microbial 
proteins precipitated preferentially, then this may have 
enhanced the final purity of the growth hormones. 

If the isolation and washing of inclusion bodies yields 
high purity recombinant protein, only a limited number 
of chromatographic steps may subsequently be required 
to yield pure protein. For prochymosin, a single 
ion-exchange column yielded protein > 90% pure. 
Acid-activation of the zymogen produces chymosin 

> 99% pure. The yield of active refolded material was 

> 40% (Marston et al, 1984). Refolded urokinase was 
purified using a five-step process (Winkler et al, 1985). 
Two of the column steps were affinity purification using 
benzamidine-Sepharose and the high yield from these 
columns was significant in generating the overall yield of 
32% of refolded protein. 

Some effort has been directed at eliminating disulphide 
bond formation in order to facilitate the recovery of 
active, soluble protein from E. coli. IFN-/? and IL-2 each 
contain three cysteine residues, and a single disulphide 
bond is thought to exist in each native protein. When 
expressed in E. coli, IFN-/ff and IL-2 are aggregated and 
insoluble. Mutant proteins or 'muteins' of IFN-/? and 
IL-2 were synthesized in E. coli after site-directed 
mutagenesis, in which one cysteine residue was either 
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deleted or replaced by a serine residue (Mark et aL, 1983 ; 
Koths et al., 1985). The polypeptides still formed 
insoluble aggregates and required denaturation and 
refolding. The mutant proteins were biologically active 
once refolded. In contrast, deletion of the three 
TV-terminal residues (Cys-Tyr-Cys) of IFN-y was re- 
ported to result in a more soluble E. coli protein (Allet, 
1985). Some characterization studies have been per- 
formed using the refolded protein (Hsu & Arakawa, 1 985), 
but details of the purification process, which is in 
preparation, are required to allow the effect of this 
deletion to be interpreted. Another example of a 
modification which affected solubility is provided by 
Moloney murine leukaemia virus (MMLV) reverse 
transcriptase. This enzyme was predominantly insoluble 
when expressed in E. coli. Insertion of a termination 
codon earlier in the 3' sequence deleted a hydrophobic 
C-terminal region and resulted in the expression of a 
totally soluble protein (Kotewicz et aL, 1985). This result 
implicates hydrophobic interactions in aggregation, 
consistent with results obtained when cloned viral 
glycoprotein genes are expressed in E. coli (Harris, 1984). 

Only insoluble, directly expressed polypeptides have 
been discussed. There are several examples of polypep- 
tides expressed in soluble, active forms in E. coli. For 
certain polypeptides only a small proportion of the 
expressed protein is soluble, but this facilitates charac- 
terization of the recombinant product. Receptor binding 
of human complement fragment C5a (Mandecki et al, 
1985) and biological activity of FMDV proteinase 
(Klump et al., 1984) were demonstrated using unpurified 
lysis supernatants. Pure retroviral p24 gag protein was 
purified from E. coli in a single immunopurification step 
(Dowbenko et al., 1985) while a five-step process was 
used to produce human growth hormone (Olsen et al. 9 
1981). Other examples of soluble proteins are human 
IFN-a (Staehelin et al., 1981; De Maeyer et al, 1982) 
bovine IFN-a (Grosfeld et al., 1985), chicken triose- 
phosphate isomerase (Straus & Gilbert, 1985), human 
lymphotoxin (Gray et al., 1984), human TNF (Pennica 
et al, 1984) and murine TNF (Pennica et al., 1985). 

There are no obvious common features to explain why 
these proteins are soluble. Expression levels varied from 
< 1 % -25 % of total cell protein. Some, but not all, of 
the authentic proteins are naturally glycosylated. The 
fact that none of these proteins contain large numbers 
of disulphides may be significant. However, there are 
several examples of polypeptides containing one or two 
disulphides which are totally insoluble (Table 1). 

Authentic proteins can be unfolded and refolded 
in vitro with little loss of activity. Yields are critically 
dependent on a number of parameters, particularly 
protein concentration, and for disulphide bond-contain- 
ing proteins, pH. Such studies have led to the 
conclusion that the amino acid sequence of polypeptides 
contains the information required for folding (Anfinsen, 
1973). Why is it, therefore, that eukaryotic polypeptides 
synthesized in E. coli and having the correct amino acid 
sequence fail to fold correctly? Insolubility does not 
result just because the proteins are expressed at a high 
percentage of total cell protein, as was observed for 
overexpressed E. coli proteins. There are examples of 
eukaryotic polypeptides expressed to levels of 1 % or less 
which are insoluble (Table 1). 

The mechanism by which proteins fold in vivo is still 
unknown. From studies in vitro it is evident that the 



amino acid sequence of each protein contains the 
information required for folding, but it is not apparent 
which residues specify the folding information. Another 
consideration is what influence, if any, the chemical 
environment within the cell has on protein folding. In the 
absence of such information it is only possible to 
speculate why some eukaryotic polypeptides fail to fold 
correctly in E. coli. 

For proteins which contain disulphide bonds, forma- 
tion of these bonds may be an essential component of the 
folding process. The inability to form disulphide bonds 
in the reduced environment of the E. coli cytoplasm may 
thus prevent folding. However this may not be the only 
reason why these proteins do not fold correctly, and 
certainly does not explain why eukaryotic proteins which 
lack disulphide bonds can also fail to fold in E. coli. 

While there is little data available about the chemical 
environment in E. coli and eukaryotic cells in vivo, there 
are some obvious differences, notably the lack of 
subcellular compartments in E. coli. Therefore it is 
possible for the nascent recombinant polypeptide chains 
to come into contact with low-A/ r components or 
macromolecules in the E. coli cytoplasm which are 
isolated in organelles in their natural cell environment. 
Interaction with these components could interfere with 
or inhibit protein folding. 

The structure of the folding eukaryotic polypeptides 
may also be important if differences in structure result in 
the proteins being recognized as foreign. It may be for 
this reason that the proteins are sequestered into 
aggregates as discussed earlier. In this respept it is 
interesting that chicken triosephosphate isomerase, a 
cytoplasmic protein of M r 53000, which is soluble in 
E. coli, has a structure which is highly conserved across 
prokaryotes and eukaryotes (Straus & Gilbert, 1985). 

Our knowledge of the mechanisms by which proteins 
fold in vivo is limited and, as highlighted in a recent 
review (King, 1986), solving the protein folding problem 
is important for development in this area of 
biotechnology. 



SECRETED PROTEINS 

E. coli possesses two cell membranes, the outer 
membrane and the cytoplasmic membrane, which are 
separated by the periplasmic space. Proteins located in 
the periplasmic space or the outer membrane are 
synthesized in the cytoplasm and exported through the 
cytoplasmic membrane (Michaelis & Beckwith, 1982). 
The proteins are synthesized as precursors, with an 
AT-terminal signal sequence which may be cleaved during 
secretion. Signal sequences are an absolute requirement 
for export (Silhavy et al., 1983) but there may also be 
information in the mature protein sequence which affects 
its localization (Tommassen et al, 1983; Ghrayeb et al., 
1984; Freundl 6tf a/., 1985). 

Very few E. coli proteins are secreted into the 
extracellular medium, across the outer membrane 
(Muller et al, 1983). While a signal sequence is required 
for periplasmic secretion there is no evidence that such 
sequences are required for secretion across the outer cell 
membrane. Enterotoxins contain signal sequences which 
are cleaved during secretion (Dallas & Falkow, 1980), 
while haemolysin is released extracellularly without 
cleavage of the signal peptide. The mechanism by which 
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proteins cross the outer membrane is unclear but appears 
to be more complex than for periplasmic secretion. 

Expression via secretion offers several advantages over 
intracellular expression, as discussed in a recent review 
comparing E. coli, yeast and Bacillus subtilis as secretion 
systems for foreign proteins (Nicaud et al., 1986). If the 
signal sequence is correctly processed, the N-terminus of 
the recombinant protein will be identical to the authentic 
product. Secretion of proteins into the periplasm can 
prevent the degradation of the polypeptides; for 
example, proinsulin located in the periplasm was 10-fold 
more stable than when located in the cytoplasm 
(Talmadge & Gilbert, 1982). Disulphide bond formation 
in E. coli has been shown to occur simultaneously with 
secretion (Pollitt & Zalkin, 1983). Folded eukaryotic 
products with correct disulphide bonds can therefore be 
produced, as demonstrated for proinsulin (Emerick 
et al, 1985) and EGF (Oka et al., 1985). 

Periplasmic secretion 

It was suggested that E. coli cannot process eukaryotic 
signal peptides efficiently (Devos et al., 1983) but more 
recent evidence is contradictory. The eukaryotic signal 
of IgG light chain both initiated secretion and was 
correctly processed (Zemel-Dreasen & Zamir, 1984) but 
when the prokaryotic /^-lactamase signal sequence was 
placed in front of the IgG light chain signal sequence 
only the bacterial signal peptide was processed during 
secretion. The natural leader sequences of urokinase 
(Jacobs et al., 1985) and human growth hormone (Gray 
et al., 1985) were both processed by E. coli, though 
secretion was only demonstrated with the latter. 
Prokaryotic signal sequences from /^-lactamase (Villa- 
Komaroff et al., 1978), OmpA protein (Ghrayeb et al., 
1984) and alkaline phosphatase (phoA) (Oka et al, 1985) 
have been used successfully to secrete eukaryotic 
polypeptides into the periplasm of E. coli. 

To assess whether proteins have been secreted, E. coli 
must be converted to spheroplasts to release the contents 
of the periplasm. A /^-lactamase proinsulin fusion was 
found to secrete over 90% of the proinsulin synthesized 
into the periplasmic space (Chan et al, 1981). The level 
of expression was 0.01% of total cell protein. Using 
genetic and physiological manipulations, the level of 
expression was improved to 0.5% of total protein 
(Emerick et al, 1984). Human EGF has also been 
secreted into the periplasm, using the phoA signal 
peptide. The strains expressing the highest level of the 
polypeptide secreted l-2mg/l of culture (approx. 
0.01-0.02% of total cell protein (Oka et al, 1985). 

In both the examples cited above, the signal sequences 
were correctly processed. The expression levels are 
apparently low, when compared with the levels achieved 
by direct expression in the cytoplasm. However, the 
polypeptides are soluble and biologically active. If the M T 
of the polypeptide secreted is small, than even at 
expression levels of 0.02%, a significant number of 
molecules are formed (1.8 x 10 5 molecules/cell for EGF). 
It has been suggested that for insulin an expression level 
of 1 % would be adequate for commercial production 
(Emerick et al, 1984). 

Periplasmic proteins are normally only 4% of total cell 
protein (Nossal & Heppel, 1966); therefore, less 
extensive purification of recombinant proteins is required 
than for proteins located in the cytoplasm. EGF has been 
purified to apparent homogeneity in a two-step process; 



gel filtration followed by reverse-phase h.p.l.c. (Oka 
et al, 1985). 

Secretion of eukaryotic polypeptides can prove toxic 
to E. coli, causing cell lysis. This was observed with 
OmpA-FMDV VP1 fusions (Henning et al., 1983) and 
^-lactamase-proinsulin fusions (Brosius, 1984). It was 
suggested that the proteins may become anchored in the 
cytoplasmic membrane, blocking export completely. 
This was previously observed for bacterial lamB-lacZ 
fusions (Emr et al., 1980). 

Extracellular secretion 

Since the mechanism by which proteins are secreted 
into the extracellular medium by E. coli is poorly 
understood, limited effort has been directed at developing 
this expression system. In an attempt to secrete the 
eukaryotic polypeptide, /^-endorphin, a fusion with the 
signal peptide and part of the AT-terminal sequence of the 
OmpF protein was constructed (Nagahari et al, 1985). 
The /^-endorphin was secreted into the culture medium, 
with the signal peptide correctly processed. Secretion was 
dependent on the presence of the signal sequence. The 
level of /?-endorphin in the medium was estimated to be 
l-2mg/litre of culture, which represented > 99% of 
the total /?-endorphin in the periplasm and medium. 
Negligible amounts were detected in the cytoplasm. In 
contrast, the percentage of the periplasmic enzymes 
/^-lactamase and alkaline phosphatase detected in the 
medium was approx. 14% and 11% respectively, 
indicating a certain level of periplasmic leakage or cell 
lysis. The mechanism of secretion was not established, 
but the TV-terminal region of the OmpF protein, included 
in the fusion, may play a role in secretion. 

The /^-endorphin polypeptides were purified from the 
medium by gel filtration through Sephadex G-10 and 
reverse phase h.p.l.c. No indication of starting purity was 
given, but column profiles (monitored at 230 nm) showed 
a number of peaks in addition to those identified as 
/^-endorphin. Sequence analysis and amino acid compo- 
sition of the purified polypeptides revealed C-terminal 
heterogeneity, which could result from premature ter- 
mination of transcription or proteolytic degradation. 

Another eukaryotic polypeptide secreted at significant 
levels into the medium by E. coli was A a-fibrinogen 
(Lord, 1985). This polypeptide has a predicted M T of 
67000. When expressed as a fusion with the ^-lactamase 
signal, 4 mg/1 were located in total cell lysates, and 
13 mg/1 in the culture medium, but comparative levels of 
cytoplasmic or periplasmic enzymes were not presented. 
However, if levels of cell lysis are low, then these results 
demonstrate that large polypeptides can be secreted into 
the medium by E. coli. Preliminary results indicated a 
correct Af-terminus but some heterogeneity at the 
C-terminus. 

CHARACTERISTICS OF THE E. COLI 
EXPRESSION SYSTEM 

There is concern over the use of E. coli as a production 
system because of the existence of endotoxins or 
lipopolysaccharide in the cell wall. In fact, lipopoly- 
saccharide contamination could be introduced into any 
manufactured drug or biological regardless of the 
organism of origin. Lipopolysaccharide is pyrogenic and 
therefore must be removed to yield a safe product. There 
are several chromatographic techniques which can be 
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used to remove lipopolysaccharide. Matrices which have 
been used include polymixin B-Sepharose, histamine- 
Sepharose, substrate-analogue-Sepharose, DEAE- 
Sepharose (Sofer, 1984). Many of these procedures are 
commonly used in protein purification. 

As a result of being synthesized in E. coli recombinant 
eukaryotic polypeptides can differ from their authentic 
counterparts. The possibility of an additional methionine 
residue at the AT-terminus of directly expressed polypep- 
tides was mentioned earlier. There are also a number of 
eukaryotic post-translational modifications of polypep- 
tides which E. coli does not perform. These include 
glycosylation, acetylation and amidation. E. coli has a 
requirement for an ATG (or GTG) codon at the 5' end 
of a gene, as translation is initiated by A^-formyl- 
methionine. The methionine is deformylated in E. coli 
during synthesis (Adams, 1968), but is not necessarily 
cleaved. Thus directly expressed polypeptides may pos- 
sess an uncharacteristic AT-terminal methionine. E. coli 
does possess an aminopeptidase with broad substrate 
specificity, but the efficiency of methionine removal from 
recombinant polypeptides is variable. The Af-terminal 
methionine residues were completely removed from 
IFN-yff (Stebbing et al, 1982) and bovine growth 
hormone (George et al., 1985), whilst there was 
differential processing of IFN-a (Staehelin et al., 1981) 
and human growth hormone (Olsen et ai, 1981). With an 
jV-terminal sequence of Met-Ala-Pro-, IL-2 was comple- 
tely unprocessed (Liang et ai, 1985). However, when a 
deletion mutant, lacking the three Af-terminal residues of 
the authentic protein (Ala-Pro-Thr-) was expressed, the 
W-terminus was Met-Ser-, and all the methionine was 
removed. Sequence effects were also observed with bovine 
growth hormone, Met-Phe- being much less efficiently 
processed than Met-Ala- (Seeburg et ai, 1983). One 
explanation is that residues near to the AT-terminus cause 
steric hindrance and thus affect the efficiency of 
methionine removal (Liang et ai, 1985). The advantage 
of using fusion proteins or secretion is that processing 
yields the correct TV-terminal residue. It is possible for the 
presence of an AT-terminal methionine to have no effect 
on biological activity, as demonstrated for human growth 
hormone and IL-2 (Olsen et ai, 1981 ; Liang et ai, 1985). 

E. coli does not possess the enzymes necessary for 
glycosylating polypeptides. There are however docu- 
mented examples of recombinant proteins synthesized 
in E. coli which are biologically active despite the fact that 
they are not glycosylated. These include IFN-/?, IFN-y 
(Edge & Camble, 1984) and IL-2 (Liang et al., 1985). 
These results not only demonstrate that carbohydrate 
residues are not essential for biological activity, but also 
that when the expressed product is insoluble they are not 
essential for refolding. Acetylation and amidation are two 
other post-translational processes which are often 
essential for the biological activity of some eukaryotic 
polypeptides. It has therefore been necessary to develop 
systems in vitro to perform these reactions on recombinant 
polypeptides isolated from E. coli. Acetylation in vitro of 
desacetylthymosin (Wetzel et ai, 1981) and amidation of 
calcitonin (S. K. Rhind, personal communication) have 
been achieved. 

Recombinant polypeptides which have residues miss- 
ing at the N-terminus or C-terminus may be produced 
in E. coli together with the intact molecules. These may 
arise as a result of late starts or premature termination 
in transcription. Alternatively, proteolysis may have 



occurred in vivo or in vitro. The product of the Ion gene 
in E. coli, protease La, has an important role in the 
degradation of abnormal proteins. The fact that Ion gene 
transcription increases with the synthesis of recombinant 
polypeptides (Goff & Goldberg, 1985) suggests that the 
use of lon~ strains might decrease degradation. Results 
have varied from no effect (Emerick et al., 1984) to 
increased expression of 150-fold (Boss et al., 1984). Full 
consideration of this subject is beyond the scope of this 
Review, but it is worth noting that there can be 
considerable variation in the accumulation of recombin- 
ant proteins in different E. coli strains (Schoemaker et al., 
1985). The extent to which the intact molecules must be 
purified from incorrect length molecules will depend 
upon the use for which the recombinant polypeptides are 
being produced. 

CONCLUDING REMARKS 

Because the knowledge of E. coli genetics was well 
advanced, it became the focus for the development of 
recombinant DNA techniques and historically, for the 
same reasons, E. coli was selected as the organism in 
which to express eukaryotic polypeptides. From the 
information described in this Review, it is clear that there 
are both advantages and disadvantages to the use of this 
expression system. 

Using intracellular expression, gram amounts of 
polypeptide can be produced per litre of fermentation, 
although the products are commonly insoluble. While 
secretion of proteins from E. coli obviates the insolubility 
phenomenon, until recently expressed yields have been 
low. The production of milligram amounts per litre of 
/^-endorphin (Nagahari et al, 1985) and A a-fibrinogen 
(Lord, 1985) show that E. coli has potential as a 
secretion system. For further development this secretion 
mechanism must be elucidated. 

Denaturation and refolding has been used to produce 
a number of biologically active eukaryotic polypeptides 
from E. coli. These include proteins with complex 
multidomain structures such as urokinase (Winkler et ai, 
1985) and tissue-type plasminogen activator (Harris 
et al., 1986). There are currently insufficient data to allow 
detailed discussion of the efficiency with which many of 
the recombinant polypeptides refold. However, the levels 
of biological activity recovered in the examples described 
were clearly adequate for analytical studies. 

The fact that recombinant polypeptides could become 
modified during the unfolding and refolding process 
must be considered. Exposure to high concentrations of 
denafurants such as urea and high pH must be 
minimized because of the possibility of chemical 
modification. To characterize the proteins, the assign- 
ment of disulphide bonds for recombinant and authentic 
products has been compared (Kohr et al., 1982). Gross 
conformation can be probed by c.d., while detailed 
structural information can be provided by n.m.r. and 
X-ray crystallography. In this respect the results for 
refolded E. coli /?-globins are encouraging. X-ray 
diffraction analysis of 0.28 nm (2.8 A) resolution revealed 
only slight electron density differences between authentic 
and mutant recombinant products. This difference could 
be explained by the Cys-^Ser change engineered by 
site-directed mutagenesis (Nagai et al., 1985). 

In conclusion, E. coli can be used to produce larger 
amounts of eukaryotic polypeptides than can be isolated 
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from natural cell sources. The removal of lipopoly- 
saccharide was highlighted earlier as an important aim of 
purification. If adequate precautions are taken, such as 
the use of specific chromatographic steps qoupled with 
the use of pyrogen-free water and maintenance of 
equipment in a pyrogen-free state, then lipopoly- 
saccharide can be reduced to an acceptably low level. 
This is borne out by the fact that there are therapeutic 
recombinant polypeptides from E. coli m use. IL-2, IFN-a, 
IFN-/?, IFN-y and TNF are among the recombinant 
polypeptides currently in clinical trials, while insulin and 
human growth hormone are examples of E. co/i-derived 
polypeptides which are now in clinical use. 

I am grateful to my colleagues Peter Lowe, Tim Harris, 
Spencer Emtage, Rick Lilley and Martyn Robinson for their 
constructive criticisms. I would like to thank Margaret Turner 
for her patience in typing this manuscript. Thanks also to Peter 
Dumbell and Alan Lyons for assistance in the preparation of 
Figures. The data of Sofer (1984) was used by permission of 
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